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affect the accuracy of these estimates. For each combination of number of 
abilities and number of items, item parameters were randomly drawn from a 
pool of 550 items from nationally standardized mathematics tests, with 1,000 
examinees simulated for each experimental condition. Results show that the 
proposed multidimensional approach gives more general outcomes, yielding 
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1 Introduction 

It is not unusual for several tests measuring different abilities to be given in one test administration. 
Although these tests may tap different latent abilities, the abilities are usually not independent of one 
another. For example, in cognitive tests such as NAEP, these abilities have high positive correlations, 
typically greater than 0.70 (Johnson & Carlson, 1994). However, a common practice in educational 
measurement is to estimate these abilities independently of each other. This paper proposes a more 
efficient method of estimating these abilities that takes into account the correlational structure of the 
abilities. The method uses a hierarchical Bayesian approach to simultaneous estimation of abilities 
based on a simple structure multidimensional item response theory model. 

2 Purpose 

The primary purpose of this paper to investigate whether the simultaneous estimation of abilities 
from different dimensions yields more accurate estimates. In addition, the paper examines how the 
number of dimensions, the number of items in each dimension, and the degree of correlation between 
abilities affect the accuracy of the estimates. 

3 Presentation of the Model 

To extend the three-parameter logistic (3PL) model (Lord, 1980) to the multidimensional context, 

Reckase (1996) used the following generalization: 

Taper presented at the Annual Meeting of the National Council on Measurement in Education, April 2002, New 
Orleans, LA 
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(1) 



gOt'jOi+fij 

P{Xij = 1| Qi, aj,^, 7 j) = 7 j + (1 - 7 j) - - -^Tg^ p - 

where 

P(Xij = 1| 9i,ctj, Pjijj) is the probability of examinee i responding to item j correctly; 

Xij is the response of examinee i to item j (0 = incorrect, 1 = correct); 

9i is the ability vector of the examinee; 

aj is the vector of item parameters related to the discrimination power of the item; 

/ 3j is the parameter related to the difficulty of the item; 

is the pseudo-guessing parameter of the item; 
i = (the total number of examinees); and 

j = 1, . . . , J (the total number of items). 

For this paper, simple structure is assumed (i.e., each item measures one dimension of ability 
and thus ctj contains only one non-zero element). 

The model in 1 can be reexpressed as: 

e ^j(d)^<(d)+/3j(d) 

P( X ij(d) = l|^(d),aj(d)./3j(d),7j(d)) = 7 j(d) + (1 - 7 m) l ~ e a }(d )d t(d )+0 ](d ) ( 2 ) 

where 

(j) is the response of examinee i to the j th item of dimension d; 
is the (P h component of the vector 0,, i.e., 6i = {#{(<*)}; 
d = 1, . . . , D (the number of dimensions); 
j(d) = 1 J(d); and 

Eii m = j. 

Refer to figure 1 for a graphical representation of the hierarchical structure of the model. 

The item response X^d) has a likelihood Pij(d) given by 

Pij{d) = = M@i(d)i (Xjid)) Pj(d)i 7?(d))) 

(1 - P{Xij( d ) = 1 |0j(d) , otj{d) i Pj(d) » Kj(d ))) 1 ~ Xim • (3) 




2 



Inv-Wishart V() (A,, 1 ) 




Figure 1: Graphical Representation of the Model 
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Let Xi . = {Xji(i)> ■ • ■ j • ■ • > -^ii(d) i ■ • • i ^iJ(d ) i ■ ■ ■ i -^ii(D)) ■ ■ ■ > ^iJ(D)} represent the response 

vector of examinee i. The corresponding likelihood of this vector is 

D J(d) 

p < = n n p ‘M- <«> 

d= lj(d) = l 

Finally, the likelihood of the data matrix X is given by 

J D J(d) 

f=nn n *w>- «» 

i=ld=lj(d)=l 

4 Markov Chain Monte Carlo Estimation 

The parameterization of the prior distribution of (Gelman et al., 1995) is 



0i|E ~ MVN(0,E) (6) 

E ~ Inv-Wisharty 0 (Ao : )- (7) 

Of primary interest is the joint distribution of 6 and E. Using the notations X = {Xij}, 6 = {#{}, 
a = {aj}, (3 = {/3j}, and 7 = { 7 *}, this joint posterior can be expressed as 



P(0, S|*, a, 0, 7 ) oc P(X \6, E, a, /?, 7 )P(0|E)P(E). ( 8 ) 

The posterior distribution in 8 can not be evaluated in a straight-forward manner (i.e., samples 
cannot be drawn directly from the joint posterior distribution). Markov chain Monte Carlo (MCMC) 
simulation is used to draw samples iteratively from the full conditional distributions Q\X, E, a, /?, 7 
and E|J£, 0, a,/3 , 7 (Casella & George, 1992; Gamerman, 1997). 

For each examinee, the full conditional distribution is 

P(fl,fX il E l a l /?,7)«|Er 1 /V^ B " , «‘Pi. (9) 



Although 9 is not a known distribution, samples can be drawn from this distribution indirectly 
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by using the Metropolis-Hastings algorithm (Chib & Greenberg, 1995; Gilks et al., 1996; Tierney, 
1994). 

The full conditional distribution of E is 

P(E|Jf l fl > a,/3,7) = P(E|fl)ocP(E)P(fl|S). (10) 

With the use of the prior distribution and the hyperdistributions given in 6 and 7, the full 
conditional posterior distribution of E is an Inv-Wishart„ / (A7 1 ), where vi = i/ 0 + /, and A / = 

Ao + £*<*<• 

The full conditional distribution of E is a known distribution and can be sampled directly. 

For the present paper, each chain is iterated 10,000 times. The first 2,000 iterations are discarded 
and inference is based on the remaining 8,000 iterations. 

Two methods of estimating ability are employed. The first method, which is based on the MCMC 
output, is called the multidimensional expected a posteriori (EAP-M) method and is computed as: 

1 10000 

0 = E( 0 |.r,a,/?, 7 )«— Y. «“’• (11) 

ovvv t = 2001 

The second is based on the expected a posteriori when the correlations between abilities are 
assumed to be zero. Except for this assumption, this method is equivalent to the first (i.e., estimates 
are based also based on MCMC draws). This is called the unidimensional expected a posteriori 
(EAP-U) and is computed as: 



9 = E(e\X,a,P,'y,p = 0). ( 12 ) 

Two methods are employed is estimating the underlying correlational structure between the 
abilities. The first method used directly estimated the correlations from the data using MCMC 
simulation. The covariance matrix is estimated as E = a, (3,1). This is similar to the abil- 

ity estimates, in that this is the average of the covariance matrices from the MCMC draws. The 
estimated covariance is standardized to obtain the correlation estimates p. The second estimate is 
based on the two-step approach where the estimate of the correlation is given by the correlation of 
the estimated abilities using the EAP-U method, (i.e., R = Cor(/?)). For example, the correlation 
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between Q\ and 62 is estimated as p = Cor( 0 i,# 2 )- 

The accuracy of the ability and correlation estimates is gauged by comparing them to the gen- 
erating parameters. In addition, because multiple values are available for each ability estimation 
method, statistical measures - Pearson correlation and root mean squared error (RMSE)- that sum- 
marize the correspondence between the estimated and the generated abilities are computed. Finally, 
the effectiveness of the proposed method is assessed by computing its efficiency, defined as the ratio 
of the average posterior variances of the ability estimates obtained using the EAP-M and the EAP-U 
methods. 

5 Design of the Study 

The factors investigated in this paper are: (i) the number of abilities, (ii) the number of items, and 
(iii) the degree of correlation between the abilities. The different number of abilities are 2 and 5, the 
number of items equals 10, 30 or 50 items, and the degree of correlation equals 0.00, 0.40, 0.70, and 
0.90. The levels of each factor are crossed completely to yield 24 experimental conditions. 

For each combination of number of abilities and number of items, item parameters are randomly 
drawn from a pool of 550 items that are obtained from nationally standardized mathematics tests. 
For each experimental condition, 1,000 examinees are drawn from MVN(0, Sxj), 

P ••• P ^ 

1 : 

P 

• •ply 

The responses of the examinees to the items simulated. The constraint on E retains the structure 
of the design and does not in anyway affect the estimation process. 
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6 Results 



6.1 Correlation estimates 

Tables 1 through 6 present the correlation estimates using the algorithm that utilized the correla- 
tional structure and the two-step procedure. Although the generating value of the correlation matrix 
have been specified, the generated ability parameters have correlation matrix that is not identical 
to the generating correlational structure. Hence, comparisons must be made between the correlation 
estimates and the generated correlation matrix, not the generating correlation matrix. 

The tables demonstrate that the correlation is well estimated by either method when there is 
no correlation between abilities. However, when the abilities are correlated, the correlation is under- 
estimated by the two-step method whereas estimates based on MCMC are closer to the generated 
values. Increasing the number of items increases the precision of the estimates for both methods. 
Estimates using the two methods are not affected by increasing the number of abilities. This is to 
be expected for the two-step approach since it ignores the additional information contained in the 
correlation matrix. In general, additional precision can be expected for the MCMC estimates as 
more abilities are considered. However, because the large number of examinees allowed for accurate 
estimation of the correlations even when only two abilities, the additional information afforded by 
adding more abilities became negligible in the process. 

Table 1: Correlation estimates for 2 dimensions and 10 items 



Method 




p 






0.00 


0.40 


0.70 


0.90 


Generated 


Cor(0i,0 2 ) 


0.00 


0.42 


0.70 


0.90 


MCMC 


P 


-0.01 


0.43 


0.68 


0.90 


Two-step 


Car(Ma) 


-0.01 


0.29 


0.46 


0.62 



Table 2: Correlation estimates for 2 dimensions and 30 items 



Method 




P 




0.00 


0.40 


0.70 


0.90 


Generated 


Cor(0i,0 2 ) 


0.05 


0.45 


0.68 


0.90 


MCMC 


P 


0.06 


0.44 


0.69 


0.92 


Two-step 


Cor (0!, 02) 


0.05 


0.38 


0.59 


0.78 
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Table 3: Correlation estimates for 2 dimensions and 50 items 



Method 




P 






0.00 


0.40 


0.70 


0.90 


Generated 


Cor(0i,0 2 ) 


-0.06 


0.39 


0.70 


0.90 


MCMC 


P 


-0.06 


0.37 


0.69 


0.91 


Two-step 


Cor(0i,0 2 ) 


-0.05 


0.34 


0.62 


0.83 



Table 4: Correlation estimates for 5 dimensions and 10 items 



Method 




P 






0.00 


0.40 


0.70 


0.90 


Generated 


Cor (0i, 0 2 ) 


-0.01 


0.41 


0.70 


0.91 


MCMC 


P 


0.00 


0.41 


0.70 


0.89 


Two-step 


Cor(0i, 0 2 ) 


0.00 


0.27 


0.44 


0.59 



Table 5: Correlation estimates for 5 dimensions and 30 items 



Method 




P 






0.00 


0.40 


0.70 


0.90 


Generated 


Cor(0 1 ,0 2 ) 


-0.01 


0.44 


0.70 


0.90 


MCMC 


P 


-0.01 


0.43 


0.70 


0.91 


Two-step 


Cor(0i,0 2 ) 


-0.01 


0.37 


0.60 


0.79 



Table 6: Correlation estimates for 5 dimensions and 50 items 



Method 




P 






0.00 


0.40 


0.70 


0.90 


Generated 


Cor(0 1 , 0 2 ) 


-0.02 


0.42 


0.68 


0.90 


MCMC 


P 


-0.02 


0.43 


0.68 


0.89 


Two-step 


Cor(0i,0 2 ) 


-0.02 


0.39 


0.62 


0.81 



6.2 Estimates of ability 

6.3 Correlation with true ability 

Tables 7 through 12 list the correlations between the true ability and the estimated ability. When 
no correlation exists between abilities, the EAP-M and EAP-U estimates correlate equally well with 
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the generating parameters. But as the correlation between abilities increases, the EAP-M estimates 
correlate more highly with the true ability whereas the EAP-U estimates are unaffected. As the 
number of items increases, the correlation between the true ability and the estimated ability for 
both methods also increases. Finally, increasing the number of abilities gives higher correlations for 
the EAP-M estimates but has has no impact on the EAP-U estimates. 



Table 7: Correlations between the true and estimated abilities for 2 dimensions and 10 items 







P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.81 


0.83 


0.84 


0.87 


EAP-U 


0.81 


0.82 


0.81 


0.82 



Table 8: Correlations between the true and estimated abilities for 2 dimensions and 30 items 







P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.92 


0.93 


0.93 


0.94 


EAP-U 


0.94 


0.93 


0.92 


0.92 



Table 9: Correlations between the true and estimated abilities for 2 dimensions and 50 items 







P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.95 


0.95 


0.96 


0.96 


EAP-U 


0.95 


0.95 


0.95 


0.95 



Table 10: Correlations between the true and estimated abilities for 5 dimensions and 10 items 







P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.80 


0.83 


0.86 


0.91 


EAP-U 


0.80 


0.81 


0.79 


0.80 
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Table 11: Correlations between the true and estimated abilities for 5 dimensions and 30 items 







P 




Method 


0.00 


0.40 0.70 


0.90 


EAP-M 


0.92 


0.93 


0.94 


0.96 


EAP-U 


0.92 


0.93 


0.93 


0.93 



Table 12: Correlations between the true and estimated abilities for 5 dimensions and 50 items 







P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.95 


0.95 


0.96 


0.97 


EAP-U 


0.95 


0.95 


0.95 


0.95 



6.4 Root mean squared error 

The number of abilities, number of items, and the degree of correlation affect the RMSE of the EAP- 
M and EAP-U estimates in the same way that they affect the correlations between the true and 
estimated abilities. That is, (a) the two methods yield equivalent results when the abilities are not 
correlated; (b) a greater number of abilities or a greater degree of correlation between the abilities 
improves the EAP-M estimates but does not affect the EAP-U estimates; and (c) an increase in the 
number of items results in more precise estimates by both methods. 



Table 13: RMSE of ability estimates for 2 dimensions and 10 items 







P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.59 


0.57 


0.54 


0.49 


EAP-U 


0.59 


0.58 


0.58 


0.57 



Table 14: RMSE of ability estimates for 2 dimensions and 30 items 







P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.39 


0.38 


0.37 


0.35 


EAP-U 


0.39 


0.39 


0.39 


0.39 
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Table 15: RMSE of ability estimates for 2 dimensions and 50 items 







P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.31 


0.30 


0.29 


0.27 


EAP-U 


0.31 


0.30 


0.30 


0.31 



Table 16: RMSE of ability estimates for 5 dimensions and 10 items 







P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.59 


0.56 


0.51 


0.43 


EAP-U 


0.59 


0.59 


0.61 


0.62 



Table 17: RMSE of ability estimates for 5 dimensions and 30 items 







P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.38 


0.38 


0.34 


0.29 


EAP-U 


0.38 


0.39 


0.38 


0.37 



Table 18: RMSE of ability estimates for 5 dimensions and 30 items 







P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.30 


0.30 


0.28 


0.24 


EAP-U 


0.30 


0.30 


0.30 


0.31 



6.5 Efficiency 

The posterior variance is approximately equal to the squared standard error of the estimate. The 
average posterior variance for each method is obtained by: (a) computing the variance of the last 
8,000 draws for each examinee; (b) averaging the variances across the 1,000 examinees; and (c) 
averaging the variance again across the different abilities. Efficiency is defined in this paper as the 
average posterior variance of the EAP-U estimates over the average posterior variance of the EAP-M 
estimates. Thus, a ratio greater than 1.00 is interpreted as the EAP-M method being more efficient 
than the EAP-U method, and vice versa. In addition, the ratio also indicates the factor by which 
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the test length needs to be increased for the EAP-U estimates to have the same precision as the 
EAP-M estimates obtained with the original test length. 

When only two dimensions are concurrently considered, the efficiency of EAP-M method is not 
evident unless the abilities are very highly correlated (i.e., p = 0.90). Efficiency at this level ranges 
from 1.25 to 1.40. Depending on the length of the test, this is equivalent to adding 3 to 12 items to 
the test. 

When five dimensions are simultaneously considered, the efficiency of the EAP-M method is 
evident for abilities that are reasonably highly correlated; efficiency ranges from 1.16 to 2.07. For 
some tests, the precision of the EAP-M estimates is equivalent to the precision of the EAP-M 
estimates obtained from tests that are twice as long. Depending on the original test length, this is 
equivalent to adding 4 to 27 item to test. 

The increase in precision is less evident when long tests are used. This is consistent with the 
results discussed earlier which indicate that marginal improvement is slight when abilities are already 
well estimated. Although, efficiency may not be as high for long tests, the corresponding number of 
additional items turn out to be larger. Finally, for a fixed level of correlation between abilities that is 
greater than zero, the efficiency obtained from simultaneously using five dimensions axe consistently 
higher compared to the efficiency obtained from using only two dimensions. 



Table 19: Posterior variance of ability estimates for 2 dimensions and 10 items 









P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.34 


0.33 


0.31 


0.25 


EAP-U 


0.34 


0.34 


0.34 


0.34 


Efficiency 


1.00 


1.03 


1.10 


1.34 



Table 20: Posterior variance of ability estimates for 2 dimensions and 30 items 









P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.15 


0.15 


0.14 


0.11 


EAP-U 


0.15 


0.15 


0.15 


0.15 


Efficiency 


LOT 


1.01 


1.12 


1.40 
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Table 21: Posterior variance of ability estimates for 2 dimensions and 50 items 



P 



Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.09 


0.09 


0.09 


0.07 


EAP-U 


0.09 


0.09 


0.09 


0.09 


Efficiency 


0.99 


1.02 


1.07 


1.25 



Table 22: Posterior variance of ability estimates for 5 dimensions and 10 items 







P 




Method 


0.00 


0.40 0.70 


0.90 


EAP-M 


0.36 


0.33 


0.25 


0.17 


EAP-U 


0.36 


0.36 


0.36 


0.36 


Efficiency 


1.00 


1.09 


1.45 


2.07 



Table 23: Posterior variance of ability estimates for 5 dimensions and 30 items 









P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.14 


0.14 


0.12 


0.08 


EAP-U 


0.14 


0.14 


0.14 


0.14 


Efficiency 


1.00 


1.05 


1.23 


1.86 



Table 24: Posterior variance of ability estimates for 5 dimensions and 50 items 









P 




Method 


0.00 


0.40 


0.70 


0.90 


EAP-M 


0.09 


0.09 


0.08 


0.06 


EAP-U 


0.09 


0.09 


0.09 


0.09 


Efficiency 


0.99 


1.03 


1.16 


1.54 



7 Discussion 



The results show that the proposed multidimensional approach to simultaneous ability estimation 
gives more general outcomes. The method gives results similar to those of the unidimensional method 
when abilities are uncorrelated. When abilities axe correlated, taking the correlation into account 
can lead to noticeable improvements in ability estimates, especially when there are multiple short 
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tests and the underlying correlation is high. In addition, this hierarchical approach allows for direct 
estimation of the correlation between the abilities. This obviates the need for a two-step approach 
that leads to biased estimates (Little Rubin, 1983; Mislevy, 1984), which in this case, are under- 
estimates. 

The multidimensional approach should be beneficial in many testing situations. The adminis- 
tration of multiple tests during one sitting is not uncommon, and as Johnson and Carlson (1994) 
reported, the different abilities measured by these tests are usually highly correlated. Although some 
of the improvement using this approach are relatively modest, it can be achieved without much ad- 
ditional cost, (i.e., only the estimation process was changed in scoring the same data sets). In a 
practical sense, use of this method means that, given a fixed number of items, ability estimates can 
be made more precise, or given a desired level of precision, the number of items can be reduced 
without loss of accuracy. 

This method can also be applied to a single test composed of several subtests. Currently, there 
is a great interest in using test results for diagnostic purposes (i.e., determining the students’ strong 
and weak points). However, in many instances this objective is not realized because separate scores 
cannot be reported due to insufficient reliability of the subtests. Hence, the continued reliance on 
a single, more global composite score (Wainer et al, 2001). This could be a promising application 
given the nature of the composite tests (i.e., multiple short sections that are highly correlated). 

It should be noted that the assumption of simple structure does not limit the usefulness of 
the proposed method. On the contrary, the assumption makes the application of the method more 
straightforward in that it can be applied without changes in the item response models to existing 
tests that have already been calibrated. 

Future research might take a variety of directions. First, although the results show that the 
proposed method works for simulated data sets, it is important to verify that it works for real-world 
data as well. Second, the present paper uses item parameters with known values. The approach can 
be broadened to include item parameter estimation such as was done by Patz and Junker (1999 a,b) 
in the unidimensional IRT case. Finally, the proposed method can be tried with other item response 
models such as the generalized graded unfolding model of Roberts, Donoghue and Laughlin (2000) 
models, and other testing contexts. 
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